Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits

نویسندگان

  • Sandip Rakshit
  • Debkumar Ghosal
  • Tanmoy Das
  • Subhrajit Dutta
  • Subhadip Basu
چکیده

Abstract: The objective of the paper is to recognize handwritten samples of basic Bangla characters using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated Bangla basic characters and digits were collected from different users. Tesseract is trained with user-specific data samples of document pages to generate separate user-models representing a unique language-set. Each such language-set recognizes isolated basic Bangla handwritten test samples collected from the designated users. On a three user model, the system is trained with 919, 928 and 648 isolated handwritten character and digit samples and the performance is tested on 1527, 14116 and 1279 character and digit samples, collected form the test datasets of the three users respectively. The user specific character/digit recognition accuracies were obtained as 90.66%, 91.66% and 96.87% respectively. The overall basic character-level and digit level accuracy of the system is observed as 92.15% and 97.37%. The system fails to segment 12.33% characters and 15.96% digits and also erroneously classifies 7.85% characters and 2.63% on the overall dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Handwritten Bangla Basic Characters and Digits using Convex Hull based Feature Set

In dealing with the problem of recognition of handwritten character patterns of varying shapes and sizes, selection of a proper feature set is important to achieve high recognition performance. The current research aims to evaluate the performance of the convex hull based feature set, i.e. 125 features in all computed over different bays attributes of the convex hull of a pattern, for effective...

متن کامل

An MLP based Approach for Recognition of Handwritten 'Bangla' Numerals

The work presented here involves the design of a Multi Layer Perceptron (MLP) based pattern classifier for recognition of handwritten Bangla digits using a 76 element feature vector. Bangla is the second most popular script and language in the Indian subcontinent and the fifth most popular language in the world. The feature set developed for representing handwritten Bangla numerals here include...

متن کامل

BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 han...

متن کامل

Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier

A novel approach for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet, is presented here. Compared to English like Roman script, one of the major stumbling blocks in Optical Character Recognition (OCR) of handwritten Bangla script is the large number of complex shaped character classes of Bangla alphabet. In addition to 50 basic character...

متن کامل

BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character Dataset

Bangla handwriting recognition is becoming a very important issue nowadays. It is potentially a very important task specially for Bangla speaking population of Bangladesh and West Bengal. By keeping that in our mind we are introducing a comprehensive Bangla handwritten character dataset named BanglaLekha-Isolated. This dataset contains Bangla handwritten numerals, basic characters and compound ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1003.5897  شماره 

صفحات  -

تاریخ انتشار 2010